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Abstract — We consider a distributed stochastic approxi- 
mation (SA) scheme for computing an equilibrium of a 
stochastic Nash game. Standard SA schemes employ dimin- 
ishing steplength sequences that are square summable but not 
summable. Such requirements provide a little or no guidance 
for how to leverage Lipschitzian and monotonicity properties 
of the problem and naive choices (such as 7^ = 1/k) generally 
do not preform uniformly well on a breadth of problems. 
While a centralized adaptive stepsize SA scheme is proposed in 
[1] for the optimization framework, such a scheme provides 
no freedom for the agents in choosing their own stepsizes. 
Thus, a direct application of centralized stepsize schemes is 
impractical in solving Nash games. Furthermore, extensions 
to game-theoretic regimes where players may independently 
choose steplength sequences are limited to recent work by 
Koshal et al. [2]. Motivated by these shortcomings, we present 
a distributed algorithm in which each player updates his 
steplength based on the previous steplength and some problem 
parameters. The steplength rules are derived from minimizing 
an upper bound of the errors associated with players' decisions. 
It is shown that these rules generate sequences that converge 
almost surely to an equilibrium of the stochastic Nash game. 
Importantly, variants of this rule are suggested where players 
independently select steplength sequences while abiding by 
an overall coordination requirement. Preliminary numerical 
results are seen to be promising. 

I. Introduction 

We consider a class of stochastic Nash games in which ev- 
ery player solves a stochastic convex program parametrized 
by adversarial strategies. Consider an TV-person stochastic 
Nash game in which the zth player solves the parametrized 
convex problem 

min E[fi(xi,x-i,€i)] , (1) 

where x_i denotes the collection {xj,j ^ i} of decisions 
of all players other than player i. For each i, the vector 
£i : Qi —> R ni is a random vector with a probability 
distribution on some set, while the function E[fi(xi, X-i, &)] 
is strongly convex in X{ for all X-i G Ylj^Xj. For every 
i, the set C R ni is closed and convex. We focus 
on the resulting stochastic variational inequality (VI) and 
consider the development of distributed stochastic approxi- 
mation schemes that rely on adaptive steplength sequences. 
Stochastic approximation techniques have a long tradition. 
First proposed by Robbins and Monro [3] for differentiable 
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functions and Ermoliev [4]-[6], significant effort has been 
applied towards theoretical and algorithmic examination of 
such schemes (cf. [7], [8]). Yet, there has been markedly little 
on the application of such techniques to solution of stochastic 
variational inequalities. Exceptions include the work by Jiang 
and Xu [9], and more recently by Koshal et al. [2]. The 
latter, in particular, develops a single timescale stochastic 
approximation scheme for precisely the class of problems 
being studied here viz. monotone stochastic Nash games. 

Standard stochastic approximation schemes provide little 
guidance regarding the choice of a steplength sequence, apart 
from requiring that the sequence, denoted by {7/e}, satisfies 
YlkLi 7& = 00 anc * YlkLi ^1 < 00 • This paper is motivated 
by the need to develop adaptive steplength sequences that 
can be independently chosen by players under a limited 
coordination, while guaranteeing the overall convergence of 
the scheme. Adaptive stepsizes have been effectively used 
in gradient and subgradient algorithms. Vrahatis et al. [10] 
presented a class of gradient algorithms with adaptive step- 
sizes for unconstrained minimization. Spall [11] developed a 
general adaptive S A algorithm based on using a simultaneous 
perturbation approach for estimating the Hessian matrix. 
Cicek et al. [12] considered the Kiefer-Wolfowitz (KW) SA 
algorithm and derived general upper bounds on its mean- 
squared error, together with an adaptive version of the KW 
algorithm. Ram et al. [13] considered distributed stochastic 
subgradient algorithms for convex optimization problems and 
studied the effects of stochastic errors on the convergence 
of the proposed algorithm. Lizarraga et al. [14] considered 
a family of two person Mutil-Plant game and developed 
Stackelberg-Nash equilibrium conditions based on the Ro- 
bust Maximum Principle. More recently, Yousefian et al. 
[1], [15] developed centralized adaptive stepsize SA schemes 
for solving stochastic optimization problems and variational 
inequalities. The main contribution of the current paper lies 
in developing a class of distributed adaptive stepsize rules 
for SA scheme in which each agent chooses its own stepsizes 
without any specific information about other agents stepsize 
policy. This degree of freedom in choosing the stepsizes has 
not been addressed in the centralized schemes. 

Before proceeding, we briefly motivate the question of 
distributed computation of Nash equilibria from two different 
standpoints: (i) First, the Nash game can be viewed as a com- 
petitive analog of a stochastic multi-user convex optimiza- 
tion problem of the form min xe x J2iLi E [A • 
Furthermore, under the assumption that equilibria of the 
associated stochastic Nash game are efficient, our scheme 
provides a distributed framework for computing solutions 



to this problem. In such a setting, we may prescribe that 
players employ stochastic approximation schemes since the 
Nash game represents an engineered construct employed for 
computing solutions; (ii) A second perspective is one drawn 
from a bounded rationality approach towards distributed 
computation of Nash equilibria. A fully rational avenue 
for computing equilibria suggests that each player employs 
a best response mapping in updating strategies, based on 
what the competing players are doing. Yet, when faced by 
computational or time constraints, players may instead take a 
gradient step. We work in precisely this regime but allow for 
flexibility in terms of the steplengths chosen by the players. 

In this paper, we consider the solution of a stochastic Nash 
game whose equilibria are completely captured by a stochas- 
tic variational inequality with a strongly monotone mapping. 
Motivated by the need for efficient distributed simulation 
methods for computing solutions to such problems, we 
present a distributed scheme in which each player employs an 
adaptive rule for prescribing steplengths. Importantly, these 
rules can be implemented with relatively little coordination 
by any given player and collectively lead to iterates that are 
shown to converge to the unique equilibrium in an almost- 
sure sense. 

This paper is organized as follows. In Section II, we 
introduce the formulation of a stochastic Nash games in 
which every player solves a stochastic convex problem. In 
Section III, we show the almost-sure convergence of the 
SA algorithm under specified assumptions. In Section IV, 
motivated by minimizing a suitably defined error bound, 
we develop an adaptive steplength stochastic approximation 
framework in which every player adaptively updates his 
steplength. It is shown that the choice of adaptive steplength 
rules can be obtained independently by each player under a 
limited coordination. Finally, in Section V, we provide some 
numerical results from a stochastic flow management game 
drawn from a communication network setting. 

Notation: Throughout this paper, a vector x is assumed to 
be a column vector. We write x T to denote the transpose of a 
vector x. \\x\\ denotes the Euclidean vector norm, i.e., \\x\\ = 
Vx T x. We use Hx(x) to denote the Euclidean projection of 
a vector x on a set X, i.e., \\x — IIx(^)|| = min^x \\x — y\\. 
Vector g is a subgradient of a convex function / with domain 
dom/ at x G dom/ when f(x) + g T (x — x) < f(x) for all 
x G dom/. The set of all subgradients of / at x is denoted by 
df(x). We write a.s. as the abbreviation for "almost surely", 
and use E[z] to denote the expectation of a random variable z. 

II. Problem formulation 

In this section, we present (sufficient) conditions associ- 
ated with equilibrium points of the stochastic Nash game 
defined by (1). The equilibrium conditions of this game 
can be characterized by a stochastic variational inequality 
problem denoted by VI(X, F), where 



N 



V V XN E[f N (x,£ N )] 



X = Y[Xi 



(2) 



with x = (#1, . . . , xn) T and Xi G X^C R n * for i = 
1, . . . , N. Given a set X C R n and a single- valued mapping 
F : X — > R n , then a vector x* G X solves a variational 
inequality VI(X,F), if 



(x ■ 



n F(x*) > for all x G X. 



(3) 



Let n = J2iLi n i> an d note mat when the sets Xi are convex 
and closed for all i, the set X G R n is closed and convex. 

In the context of solving the stochastic variational in- 
equality VI(X, F) in (2)-(3), suppose each player employs 
a stochastic approximation scheme for given by 



(4) 



X k +l,i = IL Xi (Xk,i ~ Jk,i( F i( X k) + ™k,i)) , 

for all k > and i = 1, . . . , N, where jk,i > is the stepsize 
of the ith player at iteration k, x k = (xk,i %k,2 • • • £/c,tv) t , 
6c = 6e,2 • • • 6e,iv) T , F i = E [ v ^ fi(x, &)], and 
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Note that in terms of the definition of Wk,i, Fi, and Fi, 
E[wfc,i I Fk] = 0. In addition, xq G X is a random initial 
vector independent of the random variable £ and such that 
E[||xo|| 2 ] < oo. Note that each player uses its individual 
stepsize to update its decision. 

III. A Distributed SA scheme 

In this section, we present conditions under which algo- 
rithm (4) converges almost surely to the solution of game 
(1) under suitable assumptions on the mapping. Also, we 
develop a distributed variant of a standard stochastic ap- 
proximation scheme and provide conditions on the steplength 
sequences that lead to almost-sure convergence of the iterates 
to the unique solution. Our assumptions include requirements 
on the set X and the mapping F. 

Assumption 1: Assume the following: 

(a) The sets Xi C R ni are closed and convex. 

(b) F(x) is strongly monotone with constant r\ > and 
Lipschitz continuous with constant L over the set X. 

Remark: The strong monotonicity is assumed to hold 
throughout the paper. Although the convergence results 
may still hold with a weaker assumption, such as strict 
monotonicity, but the stepsize policy in this paper leverages 
the strong monotonicity parameter which prescribes a more 
parametrized stepsize rule. This is the main reason that we 
assumed the stronger version of monotonicity. In Section V, 
we present an example where such an assumption is satisfied. 

Another set of assumptions is for the stepsizes employed 
by each player in algorithm (4). 

Assumption 2: Assume that: 

(a) The stepsize sequences are such that 7^ > for all 
k and i, with ]T^ 7M = 00 and ]T^ 7^ < 00. 

(b) There exists a scalar f3 such that < /3 < f and 



< (3 for all k > 0, where 5k and are (fixed) 



2 



positive sequences satisfying 5 k < min^i^.^ j k ,i 
and T k > maxi = i j ... } jv7fc,i for all fc > 0. 
We let F k denote the history of the method up to time k, 
i.e., T k = {£ ,£o,£i,---,6-i} for fe > 1 and F = {x }. 
Consider the following assumption on the stochastic errors, 
w k , of the algorithm. 

Assumption 3: The errors w k are such that for some 
constant v > 0, 

E[||^fc|| 2 | Fk] < v 2 a.s. for all k > 0. 
We use the Robbins-Siegmund lemma in establishing the 
convergence of method (4), which can be found in [16] 
(cf. Lemma 10, page 49). 

Lemma 1: Let {^} be a sequence of nonnegative random 
variables, where E[vo] < oo, and let {a^} and be 
deterministic scalar sequences such that: 

E[vfc + i|v , • • • ,v k ] < (1 - OLk)v k + /ife a.s. for all fc > 0, 
< a k < 1, /ife > 0, 

oo oo 

y^a/e = oo, ///, ; < oo, lim — = 0. 

k^oo ai. 

fc=0 k=0 K 

Then, ^ — » almost surely. 

The following lemma provides an error bound for algo- 
rithm (4) under Assumption 1. 

Lemma 2: Consider algorithm (4). Let Assumption 1 
hold. Then, the following relation holds a.s. for all k > 0: 

E[\\x k+1 -x*\\ 2 \r k ] <r 2 k E[\\w k \\ 2 \r k ] 

+ (1 - 2(77 + L)5 k + 2LT k + L 2 r 2 ) - || 2 . (5) 
Proof: By Assumption la, the set X is closed and 
convex. Since F is strongly monotone, the existence and 
uniqueness of the solution to VI(X, F) is guaranteed by The- 
orem 2.3.3 of [17]. Let x* denote the solution of VI(X, F). 
From properties of projection operator, we know that a vector 
x* solves VI(X, F) problem if and only if x* satisfies 

x* = IL x (x* ~ jF(x*)) for any 7 > 0. 

From algorithm (4) and the non-expansiveness property of 
the projection operator, we have for all k > and i, 

\\Zk+l,i -%i\\ 2 = \\RXi(Xk,i -Jk,i( F i( X k) + ™fc,i)) 

-Ii x M-lk,F^))\\ 2 

< \\x k ,i - x* - 7fc > i(F i (x fc ) + w k:i - Fi(x*))\\ 2 . 

Taking the expectation conditioned on the past, and using 
E.[w k ,i I F k ] = 0, we have 

E[\\x k+lti -x*\\ 2 \T k ] < \\x kti -x*\\ 2 

+ lU\F i (x k )-F t (x*)\\ 2 + 1 l l E[\\w k , i f\^ k ] 

- 2lk,i(xk,i ~ x*) T (Fi(x k ) - Fi(x*)). 



Now, by summing the preceding relations over i, we have 

E[\\x k+1 -x*\\ 2 \T k ] < \\x k -x*\\ 2 

N N 

+ ^ 7 | )i ||F,( a;fe )-F l ( a; *)|| 2 + ^ 7 I >i E[|K, i || 2 | F k ] 

Term 1 

N 

-2j2lkA x k,i - x*) T (F % (x k ) - Fi(x*)) . (6) 

1=1 

Term 2 

Next, we estimate Term 1 and Term 2 in (6). By using the 
definition of T k and by leveraging the Lipschitzian property 
of mapping F, we obtain 

Terml < Tl\\F(x k ) - F(x*)|| 2 < T 2 k L 2 \\xk - x*|| 2 . (7) 

Adding and subtracting -2^2^ =1 5 k (x k ,i - x*) T (Fi(x k ) - 
Fi(x*)) from Term 2, we further obtain 

Term2 < - 25 k (x k - x*) T (F(x k ) - F(x*)) 

N 

- 2 ^( 7m - S k ){x kii - x^fiF^Xk) - Fi{x*)). 

i=l 

By the Cauchy-Schwartz inequality, we obtain 
Term 2 < - 25 k (x k - x*) T (F(x k ) - F(x*)) 

N 

+ 2( 7m - 5 k ) hk,i ~ x\\\ \\Fi(x k ) - Fi(x*)\\ 

<-25 k (x k -x*) T (F(x k )-F(x*)) 

+ 2(T k - 5 k )\\x k - x*\\\\F(x k ) - F(x*)\\, 

where in the last relation, we use Holder's inequality. Invok- 
ing the strong monotonicity of the mapping for bounding the 
first term and by utilizing the Lipschitzian property of the 
second term of the preceding relation, we have 

Term2 < -2 V 5 k \\x k - x*|| 2 + 2(T k - 5 k )L\\x k - x*|| 2 . 

The desired inequality is obtained by combining relations (6) 
and (7) with the preceding inequality . ■ 

We next prove that algorithm (4) generates a sequence 
of iterates that converges a.s. to the unique solution of 
V1(X,F), as seen in the following proposition. Our proof 
of this result makes use of Lemma 2. 

Proposition 1 (Almost-sure convergence): Consider the 
algorithm 4. Let Assumption 1, 2 and 3 hold. Then, 

(a) The following relation holds a.s. for all k > 0: 

E[\\x k+1 -x*\\ 2 }<(l+P) 2 5 2 v 2 

+ (1 - 2(r? - pL)5 k + (1 + p) 2 L 2 S 2 )E[\\x k - x*\\ 2 ] . 

(b) The sequence {x k } generated by algorithm (4), con- 
verges a.s. to the unique solution of VI(X, F). 

Proof: (a) Assumption 2b implies that Tk < (l+(3)5 k . 
Combining this with inequality (5), we obtain 

E[K +1 -x*|| 2 \F k ] 

< (1 - 2(77 - PL)6 k + (1 + p) 2 L 2 5 2 )\\x k - x*|| 2 
+ (1 + /3) 2 ^E[||^|| 2 |^], forall£>0. 
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Taking expectations in the preceding inequality and using 
Assumption 3, we obtain the desired relation, 
(b) We show that the conditions of Lemma 1 are satisfied in 
order to claim almost sure convergence of x k to x*. Let us 
define v k ± \\x k+1 -x* || 2 , a k ± 2( V -pL)S k -L 2 S 2 k (l+p) 2 , 
and jik — (l + /^) 2 ^iE[||i(;/ c || 2 | Fk] • Since jk,i tends to zero 
for any i = 1, . . . , N, we may conclude that Sk goes to zero 
as k grows. Recall that a k is given by 



a k = 2(r] - f3L)S k 1 



(l + p) 2 L 2 S k 
2(77 - PL) 

Due to S k — >> 0, for all fc large enough, say k > ki, we have 

(1 + /3) 2 L 2 4 



1 



2(77 - 



> 0. 



Since /? < £ (Assumption 2b), it follows 77 — /3L > 0. Thus, 
we have a k > 0. Also, for k large enough, say k > k<i, we 
have a& < 1. Therefore, when k > max{&i,&2} we have 
< afe < 1. Obviously, Vk,Hk > 0. From Assumption 2a 
and Assumption 3 it follows ji k < 00. We also have 



lim — 

k^oo Q> k 



lim 



lim 



) 2 5 2 E[\\w k \\ 2 \T k ] 



(l + /3) 2 5 k E[\\w k \\ 2 I J- fc ] 
2(r? - /?£) 



) 



Since the term E^t^H 2 | Fk] is bounded by v 2 (Assump- 
tion 3) and S k — » 0, we see that lim^oo ^ = 0. Hence, 
the conditions of Lemma 1 are satisfied, which implies that 
x k converges to the unique solution, x*, almost surely. ■ 
Consider now a special form of algorithm (4) correspond- 
ing to the case when all players employ the same stepsize, 
i.e., jk,i = Ik for all k. Then, the algorithm (4) reduces to 
the following: 



Xk+i = n x (x k - lk{F{x k ) + w k )) , 



w k 



F(x k ,£ k ) - F(x k ), 



(8) 



for all k > 0. Observe that when 7^ = j k for all k, Assump- 
tion 2a is satisfied when YlkLo Ik = 00 and YlkLo 7fc < °°- 
Assumption 2b is automatically satisfied with T k = S k = j k 
and P = 0. Hence, as a direct consequence of Proposition 1, 
we have the following corollary. 

Corollary 1 (Identical step sizes): Consider algorithm (8). 
Let Assumption 1 and 3 hold. Also, let XlfcLo 7fc = 00 an d 
JXoTfe < 00. Then, 

(a) The following relation holds almost surely: 



E[\\x k+1 -x*\\ 2 } < (1 



2r?7 fc + L 2 "/l)E[\\x k 



2 ] 



, 2 2 

+ 7fc^ • 



(b) The sequence {x k } generated by algorithm (8), con- 
verges a.s. to the unique solution of VI(X, F). 



IV. A DISTRIBUTED ADAPTIVE STEPLENGTH SA SCHEME 

Stochastic approximation algorithms require stepsize se- 
quences to be square summable but not summable. These 
algorithms provide little advice regarding the choice of such 
sequences. One of the most common choices has been the 
harmonic steplength rule which takes the form of j k = | 
where > is a constant. Although, this choice guarantees 
almost-sure convergence, it does not leverage problem pa- 
rameters. Numerically, it has been observed that such choices 
can perform quite poorly in practice. Motivated by this short- 
coming, we present a distributed adaptive steplength scheme 
for algorithm (4) which guarantees almost-sure convergence 
of x k to the unique solution of VI(X, F). It is derived from 
the minimizer of a suitably defined error bound and leads to 
a recursive relation; more specifically, at each step, the new 
stepsize is calculated using the stepsize from the preceding 
iteration and problem parameters. To begin our analysis, we 
consider the result of Proposition la for all k > 0: 

E[\\x k+1 - x*\\ 2 ] <(l + l3) 2 Sy 

+ (1 - 2(tj - 0L)8 k + (1 + P) 2 L 2 6 2 )E[\\x k - x*\\ 2 ] . (9) 
When the stepsizes are further restricted so that 

V-PL 



< 5 k < 



(1 + /3) 2 L 2 ' 



we have 



1 - 2(t? - pL)5 k + L\\ + PYSi < 1 - (t? - 0L)5 k . 

Thus, for < Sk < (1+^2^2 , from inequality (9) we obtain 

E[||x fc+ i - x*|| 2 ] < (1 - (77 - pL)6 k )E[\\x k - x*f] 

+ (1 + /3) 2 5 2 k is 2 for all k > 0. (10) 

Let us view the quantity E[||xfc+i — x*|| 2 ] as an error e k +i 
of the method arising from the use of the stepsize values 
£0, 5i, . . . , S k . Relation (10) gives us an estimate of the error 
of algorithm (4). We use this estimate to develop an adaptive 
stepsize procedure. Consider the worst case which is the case 
when (10) holds with equality. In this worst case, the error 
satisfies the following recursive relation: 

e k+1 = (1 - (7/ - pL)5 k )e k + (1 + P 2 )S 2 k v 2 ! . 

Let us assume that we want to run the algorithm (4) 
for a fixed number of iterations, say K. The preceding 
relation shows that ex depends on the stepsize values up 
to the Ki\\ iteration. This motivates us to see the stepsize 
parameters as decision variables that can minimize a suitably 
defined error bound of the algorithm. Thus, the variables are 
£ , Si, . . . , Sk-i and the objective function is the error func- 
tion ex (So, £1, . . . , Sk-i)- We proceed to derive a stepsize 
rule by minimizing the error e^+i; Importantly, 5k+i can be 
shown to be a function of only the most recent stepsize Sk- 
We define the real- valued error function e k (So, Si, . . . , S k -i) 
by the upper bound in (10): 

e k +i(5o, ---,S k ) =(1 - (77 - (3L)S k )e k (S , - - - , S k -i) 
+ (l + /3 2 )(^ 2 forall/c>0, 

(11) 
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where eo is a positive scalar, 77 is the strong monotonicity 
parameter and v 2 is the upper bound for the second moments 
of the error norms \\w k \\. 

Now, let us consider the stepsize sequence given by 



On = — ^,00 e 



2(l + p) 2 v 2 



SI = F k -i ( 1 



77 

o d fc-l 



(12) 

for all fc > 1. (13) 



In what follows, we often abbreviate e k (5o, . . . , ^-1) by e& 
whenever this is unambiguous. The next proposition shows 
that the lower bound sequence of j k % given by (12)— (13) 
minimizes the errors e k over (0, jy^^j^Y • 

Proposition 2: Let e k (So, . . . , 5k-i) be defined as in (11), 
where eo > is such that eo < and L is the Lipschitz 
constant of mapping F. Let the sequence {5%} be given by 
( 1 2)-( 13). Then, the following hold: 

(a) e k (5* , . . . , 5* k ) = 2{ \tf/ 5* k for all fc > 0. 

(b) For any fc > 1, the vector (5$, 5*, . . . , is the 
minimizer of the function e^o, • • • , #fc-i) over the set 



G k 



a e R k : < aj < {1 \ 



i.e., for any k > 1 and (80, • • • , bk-i) € 



J = l 



, . . . , fc j> 



efc(5 , • • • , - e fe (*o ? • • • ^ ^fe-i) 

(a) To show the result, we use induction on fc. Trivially, 
it holds for fc = from (12). Now, suppose that we have 

efc(55 > ■ ■ ■ > = 2 ^-}l ^1 f° r some fc, and consider 

the case for fc + 1. From the definition of the error e k in (11) 
and the inductive hypothesis, we have 

2(1 + BVv 2 

e k+1 (5*, . . . , (SJ) = (1 - (rj - PL) 5%) [ ^ P L St 



V-PL 



+ (1 + /^*)V 
>*2 7/ 2 



2(l + j 8)V / 

— 

2(l + /3)V jr , 



77 -/3L 



A* 

/c + l5 



where the last equality follows by the definition of 
in (13). Hence, the result holds for all fc > 0. 
(b) First we need to show that (5$, . . . , G G&. By the 
choice of eo, i.e. eo < we have that < 5q < jj^jr^ • 
Using induction, from relations (12)— (13), it can be shown 
that < 5% < 5i_ x for all fc > 1. Thus, . . . , G G fe 
for all fc > 1. Using induction on fc, we now show that vector 



1 j • 



J k-1 



minimizes the error e/e for all fc > 1. From 



the definition of the error e\ and the relation 



ei(S* ) = 



2(1+/3)V 
77 -(3L 



8{ 



shown in part (a), we have 

ei(do) - ei(<5 *) = (1 - (77- /3L)8 )e + (1 +/3)V$g 
2(1 + /?)V 



77 -/3L 



■ 81 . 



Using = (5g - 55), we obtain 



ei(<So) - ei(<5 ) = (1 - (77 - /3L) 7 o)e + (1 + /3)V^ 
- 2(1 + y a o * + (! + /?) V(<5 *) 2 . 



where the last equality follows from eo 
Thus, we have 



rj-f3L °0' 



ei(<So) - ei(5*) = (1 + /3) V (-2J *S + *o + (^o) 2 ) 
= (1 + /3)V ((5 -(5*) 2 , 

and the inductive hypothesis holds for fc = 1. Now, sup- 
pose that e k (5 , . . . ,5 k -i) > e k (5$, . . . , holds for 
some fc and any (5 , • • • , 5k-i) £ G&, and we need to 
show that efc+i(5oj > ^+1(^5 • • • ? ^) holds for all 

(So, . . . , 5k) G Gfc+i. To simplify the notation, we use e£ +1 
to denote the error e k +i evaluated at (5$, 5* , . . . ,51), and 
efc+i when evaluating at an arbitrary vector (5q, • • • , #fc) G 
Gfc+i. Using (11) and part (a), we have 

e k+ i ~ e% +1 = (l-(v~ pL)5 k )e k + (1 + /3) 
2(l + /3) 2 ^ 

Under the inductive hypothesis, we have e k > e* k . It can be 
shown easily that when (5q,5\, . . . ,5 k ) G G^, we have < 
l-(rj-f3L)5 k < 1. Using this, the relatione* = ^^zf^ Tfc 
of part (a), and the definition of <S£ +1 , we obtain 

e k+i - e k+ i > (1 - (V ~ P L )5 k ) — — 5 h 



•n -PL 



+ {i + PYv*si 



- rj — PL h V ~ ~^ Sk 
= (l + P) 2 v 2 (5 k -5i) 2 . 

Hence, e fc - e* > (1 + p) 2 u 2 (5 k - 1 - S^) 2 holds for all 
fc > 1 and all (So, . . . , 5k-i) G G^. ■ 
We have just provided an analysis in terms of the lower 
bound sequence {S k }. We can conduct a similar analysis for 
{T k } and obtain the corresponding adaptive stepsize scheme 
using the following relation: 

E[K +1 -x*|| 2 ] <Y\v 2 

+ a - ^r r * + 2LFfc + L2rl)E ^ Xk ~ x * l|2 ] • 



When < T k < 



k - (1+/3)L 2 
2 



, we have 



E[\\x k+1 - x*f] < (1 - in_JRr k )E[\\x k - x*f] 



Tlv 2 for all k > 0. 



(14) 
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Using relation (14) and following similar approach in Propo- 
sition 2, we obtain the sequence {1^} given by 



°~ 2(1 + ^V e ° 

k — L fc-1 1 ~~ r,/-, I fli 1 fc-1 



(15) 

2(1 + j9) -.-x. forallfc>l. (16) 

Note that the adaptive stepsize sequence given by (15)- 
(16) converges to zero and moreover, it is not summable 
but squared summable (cf. [1], Proposition 3). In the fol- 
lowing lemma, we derive a relation between two recursive 
sequences, which we use later to obtain our main recursive 
stepsize scheme. 

Lemma 3: Suppose that sequences {Xk} and {7^} are 
given with the following recursive equations for all k > 0, 

Afc+i = Afc(l - A*.), and 7^+1 = 7^(1 - cry k ), 

where Ao = C70, < 70 < ~, and c > 0. Then for all k > 0, 

Afe cj k . 

Proof: We use induction on k. For k = 0, the relation 
holds since Ao = C70. Suppose that for some k > the 
relation holds. Then, we have 



7fc+i = 7fc(l - C7fc) 



C7/C+1 = cyfc(l - C7fe) 

C7fe+i = Afc(l - Afe) 

7fe+i = Afc+i. (17) 



Hence, the result holds for k + 1 implying that the result 
holds for all k > 0. ■ 

Next, we show a relation for the sequences and {T^}. 

Lemma 4: Suppose that sequences {5%} and {T^} are 
given by relations (12)— (13) and (15)— (16) and e < 
Then for all fc > 0, T* = (1 + 

Proof: Suppose that {Xk} is defined by A^+i = 
Afc(l - Afe), for allfe > 0, where A = ifefy^ e . In what 
follows, we apply Lemma 3 twice to obtain the result. By 
the definition of A and Sq, we have that A = ^~^ l>) Sq. 
Also, using eo < 2 jjt and definition of Ao, we obtain 



(r, - phf {y-PLf r? 



4(1 + /3) 2 */ 2 



2(1 + /3) 2 L 2 - 2L 2 



< 1. 



Therefore, the conditions of Lemma 3 hold for sequences 
{Xk} and {SI}. Hence, Lemma 3 yields that for all k > 0, 

(ri-PL) 
Xk = — ^ — ° k ' 

Similarly, invoking Lemma 3 again, we have X k = 
2(7+^) Therefore, from the two preceding relations, we 
can conclude the desired relation. Therefore, for all k > 0, 

ri = (i + /3)^. ■ 

The earlier set of results are essentially adaptive rules for de- 
termining the upper and lower bound of stepsize sequences, 
i.e. {SI} and {T^}. The next proposition proposes recursive 
stepsize schemes for each player of game (1). 

Proposition 3: [Distributed adaptive steplength SA rules] 
Suppose that Assumption 1 and 3 hold. Assume that set 



X is bounded, i.e. there exists a positive constant D = 
maxa; j2/G x ||^ — y\\- Suppose that the stepsizes for any player 
i = 1, . . . , N are given by the following recursive equations 
Suppose that Assumption 1 and 3 hold. Assume that set 
X is bounded, i.e. there exists a positive constant D = 
max x , ye x \\x — y\\. Suppose that the stepsizes for any player 
i = 1, . . . , N are given by the following recursive equations 

7o,* = n - . ^ 9 , NO ^ D 2 (18) 



(1 



7fc,i = 77.-1.;. 1 -1 - — 7fc-i,i ) for a11 fe > 1- (19) 

where ri is an arbitrary parameter associated with ith player 
such that Ti G [1,1 + ^-j^}, c is an arbitrary fixed constant 
< c < |, L is the Lipschitz constant of mapping F, 
and is the upper bound given by Assumption 3 such that 
D < . Then, the following hold: 
(a) ^ = ^ for any i, j = 1, . . . , N and k > 0. 



(b) Assumption 2b holds with /3 



r]-2c 



T^, and e = D 2 , where and are given by (12)- 
(13) and (15)— (16) respectively. 

(c) The sequence {xk} generated by algorithm (4) con- 
verges a.s. to the unique solution of stochastic 
VI(X,F). 

(d) The results of Proposition 2 hold for 5% when eo = D 2 . 
Proof: (a) Consider the sequence {Xk} given by 

D 2 , 



Afc+i = Afe(l - X k ) 



for all fc > 1. 



Since for any i = 1, . . . , N, we have Ao = f: 70, i, using 
Lemma 3, we obtain that for any 1 < i < N and k > 0, 

Afc = — Jk,i- 

Therefore, for any 1 < z,j < A/", we obtain the desired 
relation in part (a). 

(b) First we show that 5% and T£ are well defined. Consider 
the relation of part (a). Let k > be arbitrarily fixed. 
If Jk,i > Ik j for some i ^ j, then we have n > 
Tj. Therefore, the minimum possible jk,i is obtained with 
Ti = 1 and the maximum possible 7^ is obtained with 
n = 1 + 2=^. Now, consider (18)-(19). If, r» = 1, and 
.D 2 is replaced by eo, and c by y? ~ 2 /3L , we get the same 
recursive sequence defined by (12)— (13). Therefore, since 
the minimum possible 7^ is achieved when = 1, we 
conclude that 5% < min^i 5 ... ? iv Jk,i for any k > 0. This 
shows that is well-defined in the context of Assumption 
2b. Similarly, it can be shown that T^ is also well-defined 
in the context of Assumption 2b. Now, Lemma 4 implies 
that T* = (1 + u ^ £ )Sl for any k > 0, which shows that 

z and < c < \. 



Assumption 2b is satisfied since /3 = !Zz ^ £ 
(c) In view of Proposition 1, to show the almost-sure conver- 
gence, it suffices to show that Assumption 2 holds. Part (b) 
implies that Assumption 2b holds for the specified choices. 
Since 7^ is a recursive sequence for each i, Assumption 2a 
holds using Proposition 3 in [1]. 
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(d) Since D < V%z> il f°U° ws mat e o < which shows 
that the conditions of Proposition 2 are satisfied. ■ 

V. Numerical results 

In this section, we report the results of our numerical 
experiments on a stochastic bandwidth- sharing problem in 
communication networks (Sec. V-A). We compare the per- 
formance of the distributed adaptive stepsize SA scheme 
(DAS A) given by (18)— (19) with that of SA schemes with 
harmonic stepsize sequences (HSA), where agents use the 
stepsize | at iteration k. More precisely, we consider three 
different values of the parameter 0, i.e., = 0.1, 1, and 10. 
This diversity of choices allows us to observe the sensitivity 
of the HSA scheme to different settings of the parameters. 

A. A bandwidth-sharing problem in computer networks 

We consider a communication network where users com- 
pete for the bandwidth. Such a problem can be captured 
by an optimization framework (cf. [18]). Motivated by this 
model, we consider a network with 16 nodes, 20 links and 
5 users. Figure 1 shows the configuration of this network. 
Users have access to different routes as shown in Figure 1. 




Fig. 1: The network 

For example, user 1 can access routes 1, 2, and 3. Each 
user is characterized by a cost function. Additionally, there 
is a congestion cost function that depends on the aggregate 
flow. More specifically, the cost function user i with flow 
rate (bandwidth) X{ is defined by 



■ £ 



&(r)log(l + Xi(r)), 



for i = 1, . . . , 5, where x = (x\\ . . . ; x§) is the flow decision 
vector of the users, £ = (£i; . . . ; £5) is a random parameter 
corresponding to the different users, 1Z(i) = {1, 2, . . . , n^} 
is the set of routes assigned to the i-th user, Xi(r) and £i(r) 
are the r-th element of the decision vector xi and the random 
vector £^, respectively. We assume that £j(r) is drawn from 
a uniform distribution for each i and r and the links have 
limited capacities given by b. 

We may define the routing matrix A that describes the 
relation between set of routes 1Z = {1,2,. ..,9} and set 



of links C = {1,2,..., 20}. Assume that A\ r = 1 if route 
r G 7Z goes through link I G C and A\ T = otherwise. 
Using this matrix, the capacity constraints of the links can 
be described by Ax < b. 

We formulate this model as a stochastic optimization 
problem given by 



N 



minimize ^ £;)] + c(x) 

subject to Ax < 6, and x > 0, 



(20) 



where c(x) is the network congestion cost. We consider this 
cost of the form c(x) = \\Ax\\ 2 . Problem (20) is a convex 
optimization problem and the optimality conditions can be 
stated as a variational inequality given by V f(x*) T (x — 
x*) > 0, where f(x) = £^1 E[fi(x u &)] + c(x). Using 
our notation in Sec. II, we have 



F(x) 



6(2) 



l + xi(l)'"'' l + x 5 {2) 



2A T Ax, 



where ^(n) = Efofa)] for any i = 1, . . . , 5 and n = 
1, . . . , rii. It can be shown that the mapping F is strongly 
monotone and Lipschitz with specified parameters (cf. [19]). 
We solve the bandwidth-sharing problem for 12 different 
settings of parameters shown in Table I. We consider 4 
parameters in our model that scale the problem. Here, 
denotes the multiplier of the capacity vector b, m c denotes 
the multiplier of the congestion cost function c(x), and 
and are two multipliers that parametrize the random 
variable £. S(i) denotes the i-th setting of parameters. For 
each of these 4 parameters, we consider 3 settings where 
one parameter changes and other parameters are fixed. This 
allows us to observe the sensitivity of the algorithms with 
respect to each of these parameters. The SA algorithms 





S(i) 


m h 


m c 




d Z 


m b 


1 


1 


1 


5 


2 




2 


0.1 


1 


5 


2 




3 


0.01 


1 


5 


2 


m c 


4 


0.1 




2 


1 




5 


0.1 


1 


2 


1 




6 


0.1 


0.5 


2 


1 


m Z 


7 




1 


1 


5 




8 




1 


2 


5 




9 




1 


5 


5 




10 




0.01 


1 


1 




11 




0.01 


1 


2 




12 




0.01 


1 


5 



TABLE I: Parameter settings 

are terminated after 4000 iterates. To measure the error 
of the schemes, we run each scheme 25 times and then 
compute the mean squared error (MSE) using the metric 

^Ei=ill4 - x *ll 2 for an y k = 1,...,4000, where i 
denotes the i-th sample. Table II and III show the 90% 
confidence intervals (CIs) of the error for the DASA and 
HSA schemes. 

Insights: We observe that DASA scheme performs favor- 
ably and is far more robust in comparison with the HSA 
schemes with different choice of 6. Importantly, in most of 
the settings, DASA stands close to the HSA scheme with 
the minimum MSE. Note that when 6 = 1 or 6 = 10, the 



stepsize | is not within the interval (0, 



(1+/3) 2 L 2 . 



for small 
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1 

2 
3 


[2.97e-6,4.66e-6] 
[2.97e-6,4.66e-6] 
[1.15e-7,3.04e-7] 


[1.52e-6,2.37e-6] 
[1.52e-6,2.37e-6] 
[2.12e-8,4.92e-8] 




4 
5 
6 


[4.39e-7,6.55e-7] 
[1.29e-6,1.97e-6] 
[3.44e-6,5.36e-6] 


[1.33e-6,1.80e-6] 
[9.00e— 6,1.20b— 5] 
[2.26e-4,2.53e-4] 




7 
8 
9 


[4.29e-5,6.40e-5] 
[3.18e-5,4.83e-5] 
[1.83e-5,2.88e-5] 


[7.92e-5,l-49e-4] 
[3.46e-5,6.07e-5] 
[6.12e— 6,9.99e— 6] 




10 
11 
12 


[3.82e-4,5.91e-4] 
[9.81e-4,l-44e-3] 
[6.26e-3,8.44e-3] 


[2.86e+l,2.86e+l] 
[2.86e+l,2.86e+l] 
[2.85e+l,2.86e+l] 



TABLE II: 90% CIs for DASA and HSA schemes - Part I 



I s(Q 



3 



[1.70e- 
[1.70e- 
[4.66e- 



6,2.97e-6] 
6,2.97e-6] 
8,1.17e-7] 



[4.71e 
[7.88e- 
[1.25e 



-7,8.75e-7] 
-7,1.36e-6] 
-6,1.99e-6] 



[2.83e- 
[1.97e- 
[1.06e- 



5,4.75e-5] 
5,3.39e-5] 
5,1.85e-5] 



[5.50e- 
[5.45e- 
[5.47e- 



l,5.70e-l] 
l,5.85e-l] 
l,6.44e-l] 



[1.33e- 
[1.33e- 
[8.07e- 



5,1.81e-5] 
5,1.81e— 5] 
7,2.43e-6] 



[3.84e- 
[5.61e- 
[7.34e- 



6,5.38e-6] 
6,7.98e-6] 
6,l-12e-5] 



[1.84e- 
[1.40e- 
[8.33e- 



4,2.75e-4] 
4,1.99e-4] 
5,1.13e-4] 



[7.23e- 
[2.85e- 
[1.77e- 



5,9.64e-5] 
4,3.80e-4] 
3,2.36e-3] 



TABLE III: 90% CIs for DASA and HSA schemes - Part II 

k and is not feasible in the sense of Prop. 2. Comparing 
the performance of each HSA scheme in different settings, 
we observe that HSA schemes are fairly sensitive to the 
choice of parameters. For example, HSA with 6 = 0.1 
performs very well in settings S(l), S(2), and S(3), while its 
performance deteriorates in settings S(10),S(11), and S(12). 
A similar discussion holds for other two HSA schemes. A 
good instance of this argument is shown in Figure 2 and 3. 
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Fig. 2: DASA vs. HSA schemes - Setting S(4) 
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Fig. 3: DASA vs. HSA schemes - Setting S(ll) 

VI. Concluding remarks 

We considered distributed monotone stochastic Nash 
^arnes where each player minimizes a convex function on 



a closed convex set. We first formulated the problem as 
a stochastic VI and then showed that under suitable con- 
ditions, for a strongly monotone and Lipschitz mapping, 
the SA scheme guarantees almost-sure convergence to the 
solution. Next, motivated by the naive stepsize choices of 
SA algorithm, we proposed a class of distributed adaptive 
steplength rules where each player can choose his own 
stepsize independent of the other players from a specified 
range. We showed that this scheme provides almost-sure 
convergence and also minimizes a suitably defined error 
bound of the SA algorithm. Numerical experiments, reported 
in Section V confirm this conclusion. 

References 

[1] F. Yousefian, A. Nedic, and U. V. Shanbhag, "On stochastic gradient 
and subgradient methods with adaptive steplength sequences," Auto- 
matical vol. 48, no. 1, pp. 56-67, 2012, an extended version of the 
paper available at: http://arxiv.org/abs/1105.4549. 

[2] J. Koshal, A. Nedic, and U. V. Shanbhag, "Single timescale regularized 
stochastic approximation schemes for monotone nash games under 
uncertainty," Proceedings of the IEEE Conference on Decision and 
Control (CDC), pp. 231-236, 2010. 

[3] H. Robbins and S. Monro, "A stochastic approximation method," Ann. 
Math. Statistics, vol. 22, pp. 400-407, 1951. 

[4] Y. M. Ermoliev, Stochastic Programming Methods. Moscow: Nauka, 
1976. 

[5] , "Stochastic quasigradient methods and their application to 

system optimization," Stochastics, vol. 9, pp. 1-36, 1983. 

[6] , "Stochastic quasigradient methods," in Numerical Techniques 

for Stochastic Optimization. Sringer-Verlag, 1983, pp. 141-185. 
[7] V. S. Borkar, Stochastic Approximation: A Dynamical Systems View- 
point. Cambridge University Press, 2008. 
[8] H. J. Kushner and G. G. Yin, Stochastic Approximation and Recursive 

Algorithms and Applications. Springer New York, 2003. 
[9] H. Jiang and H. Xu, "Stochastic approximation approaches to the 
stochastic variational inequality problem," IEEE Transactions on Au- 
tomatic Control, vol. 53, no. 6, pp. 1462-1475, 2008. 

[10] M. Vrahatis, G. Androulakis, J. Lambrinos, and G. Magoulas, "A 
class of gradient unconstrained minimization algorithms with adaptive 
stepsize," Journal of Computational and Applied Mathematics, vol. 
114, pp. 367-386, 2000. 

[11] J. C. Spall, "Adaptive stochastic approximation by the simultaneous 
perturbation method," IEEE Transactions Automatic Control, vol. 45, 
no. 10, pp. 1839-1853, 2000. 

[12] D. Cicek, M. Broadie, and A. Zeevi, "General bounds and finite- 
time performance improvement for the kiefer-wolfowitz stochastic 
approximation algorithm," To appear in Operations Research, 2011. 

[13] A.N. S.S. Ram and V. Veeravalli, "Incremental stochastic subgradient 
algorithms for convex optimization," SIAM Journal on Optimization, 
vol. 20, no. 2, pp. 691-717, 2019. 

[14] M. Jimenez-Lizarraga, A. Poznyak, and M. Alcorta, "Leader-follower 
strategies for a multi-plant differential game," Proceedings of the 
American Control Conference, 2008. 

[15] F. Yousefian, A. Nedic, and U. Shanbhag, "A regularized adaptive 
steplength stochastic approximation scheme for monotone stochastic 
variational inequalities," Proceedings of the 2011 Winter Simulation 
Conference, pp. 4110-4121, 2011. 

[16] B. Polyak, Introduction to optimization. New York: Optimization 
Software, Inc., 1987. 

[17] F. Facchinei and J.-S. Pang, Finite- dimensional variational inequalities 
and complementarity problems. Vols. 1,11, ser. Springer Series in 
Operations Research. New York: Springer- Verlag, 2003. 

[18] S.-W. Cho and A. Goel, "Bandwidth allocation in networks: a single 
dual update subroutine for multiple objectives," Combinatorial and 
algorithmic aspects of networking, vol. 3405, pp. 28-41, 2005. 

[19] F. Yousefian, A. Nedic, and U. Shanbhag, "Distributed adaptive 
steplength stochastic approximation schemes for cartesian stochastic 
variational inequality problems," Submitted to Mathematical Program- 
ming, January 2013. 



8 



